Local vs. global memory in the IBM RP3: experiments and performance modelling
نویسنده
چکیده
A number of experiments regarding the placement of instructions, private data and shared data in the Non-Uniform Memory Access multiprocessor, RP3 has been performed. Three Scientific/Mathematical workloads have been used in the experiments, and the results have been modelled in a simple performance model which takes linear contention into consideration. The results indicate that it can very well be feasible not to have memory local to the processors in RP3-like archi-tectures. There seems to be a trade-off between the effort spent in the design on the memory system and the intercon-nection network and the use of local memory which can be costly in terms of prohibited process migration and more complicated software management. In the construction of highly parallel multiprocessors, different approaches have been adopted in order to achieve efficient access to main memory. Some use Uniform Memory Access (UMA) making sure that performance is not degraded by having processor cache memories. Others which have Non-Uniform Memory Access (NUMA) achieve this by placing pieces of the memory space in memory close to a processor, thus reducing the traffic in the interconnection network. The different architectures require different techniques to reduce network and memory contention. The IBM RP3 multiprocessor can be said to belong to both of these classes with its capability to configure its memory as local memory, costly to access by remote processors , and/or global memory which is almost uniformly accessible by all processors. It has processor caches and a high bandwidth, low-latency multistage interconnection network. One of the most commonly mentioned advantages of having shared memory multiprocessors is for their ease of programming, and the question of load balancing. Ease of programming because of the insignificance of where shared data are located, and with shared memory equally accessible from all processors, it makes no difference in which processor a specific thread (The term thread is in this paper used denoting the smallest concurrently executable unit, see Section 2.2) executes. A thread may even move around between processors in order to utilize them more efficiently. A uniform memory access multiprocessor architecture has the processors on one side of the interconnection network , and the memory on the other side. All memory references has to go through the interconnection network. There may be local processor caches in order to minimize the traffic through the interconnection network. In contrast, the NUMA architecture there is memory close to each processor accessible …
منابع مشابه
Operating system support for parallel programming on RP3
RP3, the Research Parallel Processing Prototype, was a research vehicle for exploring the hardware and software aspects of highly parallel computation. RP3 was a shared-memory machine that was designed to be scalable to 512 processors; a 64-processor machine was in operation from October 1988 through March 1991. A parallel-programming environment based on the Mach operating system was developed...
متن کاملA Modified Grey Wolf Optimizer by Individual Best Memory and Penalty Factor for Sonar and Radar Dataset Classification
Meta-heuristic Algorithms (MA) are widely accepted as excellent ways to solve a variety of optimization problems in recent decades. Grey Wolf Optimization (GWO) is a novel Meta-heuristic Algorithm (MA) that has been generated a great deal of research interest due to its advantages such as simple implementation and powerful exploitation. This study proposes a novel GWO-based MA and two extra fea...
متن کاملThe Relationship between Local and Global Coherence and Cognitive Processes in Persian-speaking Elderly Population
Objective: Many studies have suggested that there is a relationship between coherence and cognitive processes. This study aims at investigating this hypothesis through assessing the relationship between cognitive variables and coherence in the discourse of two groups of Persian-speaking younger and older adults. Methods: In order to evaluate our participants' cognitive capabilities, we recrui...
متن کاملLocal and Global Friction Factor in a Channel with V-Shaped Bottom
This paper presents an experimental research on the distribution of local friction factor, fb, and global friction factor, f, over the cross-section of a channel with V-shaped bottom, which typically occurs in sewers and culverts. Several series of experiments were conducted for measuring velocity and boundary shear stress. It is found that, Darcy-Weisbach, f, is more sensitive than other resis...
متن کاملExperimental Study for Protection of Piers Against Local Scour Using Slots
The most important causes of bridge failure are local scour. In this study, laboratory experiments were conducted to investigate the effectiveness of slot as a protection device in reduction of depth of scour at cylindrical piers under clear water flow conditions. The development time of scour depth at the circular pier with and without a slot as a protection device was conducted. The experimen...
متن کامل